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Abstract 

We give lower bounds for the problem of stable sparse recovery from adaptive linear measure- 
ments. In this problem, one would like to estimate a vector x € R" from m linear measurements 
AiX, . . . , A m x. One may choose each vector Ai based on A±x, . . . , Ai-ix, and must output x 
satisfying 

||^ — ac|| < (1 + e), min \\x-x'\\ 

^ fc-sparsc x' ^ 

with probability at least 1 — 8 > 2/3, for some p € {1, 2}. For p = 2, it was recently shown that 
this is possible with m = 0(ifcloglog(n/fc)), while nonadaptively it requires 0(-fclog(n/A;)). 
It is also known that even adaptively, it takes m = f2(fc/e) for p — 2. For p = 1, there is a 
non-adaptive upper bound of 0(-^=k\ogn). We show: 

• For p — 2, rn = 57(log log n). This is tight for k — 0(1) and constant e, and shows that 
the log log n dependence is correct. 

• If the measurement vectors are chosen in R "rounds", then m = fl(R\og 1 ^ R n). For 
constant e, this matches the previously known upper bound up to an O(l) factor in R. 

• For p = 1, m = Q(k/(^/e- logfc/e)). This shows that adaptivity cannot improve more than 
logarithmic factors, providing the analogue of the m = Q(k/e) bound for p = 2. 



1 Introduction 



Compressed sensing or sparse recovery studies the problem of solving underdetermined linear sys- 
tems subject to a sparsity constraint. It has applications to a wide variety of fields, including 
data stream algorithms [Mut05], medical or geological imaging [CRT06, Don06], and genetics test- 
ing [SAZ10]. The approach uses the power of a sparsity constraint: a vector x' is k-sparse if at 
most k coefficients are non-zero. A standard formulation for the problem is that of stable sparse 
recovery: we want a distribution A of matrices A G R mxn such that, for any x G W l and with 
probability 1 — 5 > 2/3 over A £ A, there is an algorithm to recover x from Ax with 

\\x — x|| < (1 + e) min \\x — x'\\ (1) 

fc-sparsc x' P 

for some parameter e > and norm p. We refer to the elements of Ax as measurements. We say 
Equation (1) denotes t p /i p recovery. 

The goal is to minimize the number of measurements while still allowing efficient recovery of 
x. This problem has recently been largely closed: for p = 2, it is known that m = ®(\k\og(n/k)) 
is tight (upper bounds in [CRT06, GLPSIO], lower bounds in [PW11,_CD11]), and for p = 1 it 
is known that m = O(-^klogn) and m = ^(^) [PW11] (recall that 0(f) means 0(f log c /) for 

some constant c, and similarly f2(/) means fi(// log c /)). 

In order to further reduce the number of measurements, a number of recent works have consid- 
ered making the measurements adaptive [JXC08, CHNR08, HCN09, HBCN09, MSW08, AWZ08, 
IPW11]. In this setting, one may choose each row of the matrix after seeing the results of previous 
measurements. More generally, one may split the adaptivity into R "rounds" , where in each round 
r one chooses A r G W 2rXn based on A 1 x, . . . , A r ~ 1 x. At the end, one must use A 1 x, . . . , A R x to 
output x satisfying Equation (1). We would still like to minimize the total number of measurements 
m = ^2 m i- I n the p = 2 setting, it is known that for arbitrarily many rounds 0(-k log log (re/A:)) 
measurements suffice, and for 0(r log* k) rounds 0(-kr Xog 1 ^ (n/k)) measurements suffice [IPW11]. 

Given these upper bounds, two natural questions arise: first, is the improvement in the depen- 
dence on n from \og(n/k) to loglog(n/fe) tight, or can the improvement be strengthened? Second, 
can adaptivity help by more than a logarithmic factor, by improving the dependence on k or e? 

A recent lower bound showed that Q(k/e) measurements are necessary in a setting essentially 
equivalent to the p = 2 case [ACD11] 1 . Thus, they answer the second question in the negative for 
p = 2. Their techniques rely on special properties of the 2-norm; namely, that it is a rotationally 
invariant inner product space and that the Gaussian is both 2-stable and a maximum entropy 
distribution. Such techniques do not seem useful for proving lower bounds for p = 1. 



Our results. For p = 2, we show that any adaptive sparse recovery scheme requires fi(loglogn) 
measurements, or Q^Rlog 1 ^ n) measurements given only R rounds. For k = 0(1), this matches 
the upper bound of [IPW11] up to an 0(1) factor in R. It thus shows that the log log n term in the 
adaptive bound is necessary. 

For p = 1, we show that any adaptive sparse recovery scheme requires Q.(k/y/e) measurements. 
This shows that adaptivity can only give polylog(n) improvements, even for p = 1. Additionally, 
our bound of Cl(k/ (y/e- \og(k/\fe))) improves the previous non-adaptive lower bound for p = 1 and 
small e, which lost an additional logfc factor [PW11]. 

1 Both our result and their result apply in both settings. See Appendix A for a more detailed discussion of the 
relationship between the two settings. 
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Related work. Our work draws on the lower bounds for non-adaptive sparse recovery, most 
directly [PW11]. 

The main previous lower bound for adaptive sparse recovery gets m = Q(k/e) for p = 2 [ACD11]. 
They consider going down a similar path to our Q(loglogn) lower bound, but ultimately reject it as 
difficult to bound in the adaptive setting. Combining their result with ours gives a S!(i£; + loglog n) 
lower bound, compared with the 0(^k ■ log log n) upper bound. The techniques in their paper do 
not imply any bounds for the p = 1 setting. 

For p = 2 in the special case of adaptive Fourier measurements (where measurement vectors are 
adaptively chosen from among n rows of the Fourier matrix), [HIKP12] shows fl(k \og{n/k)/ log log n) 
measurements are necessary. In this case the main difficulty with lower bounding adaptivity is 
avoided, because all measurement rows are chosen from a small set of vectors with bounded 
norm; however, some of the minor issues in using [PW11] for an adaptive bound were dealt with 
there. 

Our techniques. We use very different techniques for our two bounds. 

To show O(loglogn) for p = 2, we reduce to the information capacity of a Gaussian channel. 
We consider recovery of the vector x = e^* + w, for i* E [n] uniformly and w ~ N(0, I n /@(n)). 
Correct recovery must find i* , so the mutual information Ax) is Q(logn). On the other hand, 
in the nonadaptive case [PW11] showed that each measurement Ajx is a power-limited Gaussian 
channel with constant signal-to-noise ratio, and therefore has I(i*;Ajx) = 0(1). Linearity gives 
that I(i*;Ax) = 0(m), so m = Q(logn) in the nonadaptive case. In the adaptive case, later 
measurements may "align" the row Aj with i*, to increase the signal-to- noise ratio and extract 
more information — this is exactly how the upper bounds work. To deal with this, we bound how 
much information we can extract as a function of how much we know about i* . In particular, we 
show that given a small number b bits of information about i*, the posterior distribution of i* 
remains fairly well "spread out". We then show that any measurement row Aj can only extract 
0(b + 1) bits from such a spread out distribution on i* . This shows that the information about i* 
increases at most exponentially, so fi(loglogn) measurements are necessary. 

To show an Q,{k/y/e) bound for p = 1, we first establish a lower bound on the multiround distri- 
butional communication complexity of a two-party communication problem that we call Multi^oo, for 
a distribution tailored to our application. We then show how to use an adaptive (l + e)-approximate 
i\/ti sparse recovery scheme A to solve the communication problem M ultimo, modifying the general 
framework of [PW11] for connecting non-adaptive schemes to communication complexity in order 
to now support adaptive schemes. By the communication lower bound for Multi^oo, we obtain a 
lower bound on the number of measurements required of A. 

In the Gap^oo problem, the two players are given x and y respectively, and they want to ap- 
proximate || x — y | |oo given the promise that all entries of x — y are small in magnitude or there 
is a single large entry. The Multi^ problem consists of solving multiple independent instances of 
Gap^oo in parallel. Intuitively, the sparse recovery algorithm needs to determine if there are entries 
of x — y that are large, which corresponds to solving multiple instances of Gap^x,. We prove a mul- 
tiround direct sum theorem for a distributional version of Gap^, thereby giving a distributional 
lower bound for Multi^. A direct sum theorem for Gap^x, has been used before for proving lower 
bounds for non-adaptive schemes [PW11], but was limited to a bounded number of rounds due to 
the use of a bounded round theorem in communication complexity [BR11]. We instead use the 
information complexity framework [BJKS04] to lower bound the conditional mutual information 
between the inputs to Gap^x, and the transcript of any correct protocol for Gap^x under a certain 
input distribution, and prove a direct sum theorem for solving k instances of this problem. We 
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need to condition on "help variables" in the mutual information which enable the players to embed 
single instances of Gap^x, into Multi^ in a way in which the players can use a correct protocol on 
our input distribution for Multi^ as a correct protocol on our input distribution for Gap^; these 
help variables are in addition to help variables used for proving lower bounds for Gap^oo, which 
is itself proved using information complexity. We also look at the conditional mutual information 
with respect to an input distribution which doesn't immediately fit into the information complexity 
framework. We relate the conditional information of the transcript with respect to this distribution 
to that with respect to a more standard distribution. 

2 Notation 

We use lower-case letters for fixed values and upper-case letters for random variables. We use logx 
to denote log 2 x, and Inx to denote log e x. For a discrete random variable X with probability p, 
we use H(X) or H(p) to denote its entropy 

H(X) = H(p) = -P(x) logp(x). 

For a continuous random variable X with pdf p, we use h(X) to denote its differential entropy 

h(X) = / —p(x) log p(x)dx. 
Jxex 

Let y be drawn from a random variable Y. Then (X \ y) = (X \ Y = y) denotes the random 
variable X conditioned on Y = y. We define h(X | Y) = Ky^y h(X | y). The mutual information 
between X and Y is denoted I(X; Y) = h(X) - h(X | Y). 

For p G W 1 and S C [n], we define ps £ M n to equal p over indices in S and zero elsewhere. 

We use / < g to denote f = O(g). 

3 Tight lower bound for p = 2, k = 1 

We may assume that the measurements are orthonormal, since this can be performed in post- 
processing of the output, by multiplying Ax on the left to orthogonalize A. We will give a lower 
bound for the following instance: 

Alice chooses random i* G [n] and i.i.d. Gaussian noise w G W 1 with Efllitfll^] = a 2 = 0(1), 
then sets x = e^* +w. Bob performs R rounds of adaptive measurements on x, getting y r = A r x = 

. . . , yln r ) in each round r. Let I* and Y r denote the random variables from which i* and y r 
are drawn, respectively. We will bound /(/*; Y 1 , Y 2 , . . . , Y r ). 

We may assume Bob is deterministic, since we are giving a lower bound for a distribution over 
inputs - for any randomized Bob that succeeds with probability 1 — 5, there exists a choice of 
random seed such that the corresponding deterministic Bob also succeeds with probability 1 — 5. 

First, we give a bound on the information received from any single measurement, depending on 
Bob's posterior distribution on I* at that point: 

Lemma 3.1. Let I* be a random variable over [n] with probability distribution pi = Pr[7* = i], 
and define 

n 

b = ^2pi \og(npi). 
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Define X = ej* + iV(0, I n a 2 /n). Consider any fixed vector »6K™ independent of X with \\v\\ 2 = 1, 
and define Y = v • X. Then 

I{vi*;Y) < C(6+l) 

/or some constant C . 

Proof. Let $ = {j | 2* < np,- < 2 i+1 } for i > and 5 = {i \ n Pi < 2}. Define t { = EjeS^i = 
Pr[J* G 5^. Then 

oo 

i=o «>o jeSi 

= 6 - ^ pjlog(npj) 

<6-t Iog(nt /|5o|) 
< 6+ \S \/(ne) 

using convexity and minimizing x log ax at x = 1/ (ae) . Hence 

oo 

iU < b + 1 (2) 

i=0 

Let W = N(0,a 2 /n). For any measurement vector v, let Y = v ■ X ~ vj* + W. Let Yi = (Y \ 
I* eSi). Because Y> 2 = 1, 

EK 2 ] = o- 2 /n + ^ ujpj/ti < <x 2 /n + bsJL A* < o" 2 /n + 2 l+1 /(n^). (3) 

Let T be the (discrete) random variable denoting the i such that /* G S^. Then 1" is drawn from 
Yt, and T has probability distribution t. Hence 

h(Y) < h((Y,T)) 

= H(T) + h(Y T | T) 

j>0 



<xJ(t) + ^^(iv(o,E[y t 2 ])) 



j>0 

because the Gaussian distribution maximizes entropy subject to a power constraint. Using the 
same technique as the Shannon-Hartley theorem, 

I( Vl * , Y) = I(vi* ; vi* + W ) = h(vi* +W) - h(vi* + W | vi* ) 

= h(Y) - h(W) 

< H{ t) + Y,ti(h(N(0,E[Y?])) - h(W)) 

i>0 
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and hence by Equation (3), 



Ifa. ; Y) < H(t) + ^J> log(l + ^). (4) 

i>0 1 

All that requires is to show that this is 0(1 + b). Since a = 0(1), we have 

u io g (i + 4r) < Mi + V^ 2 ) + E + 7-) 

< 0(1) + J> log(l + 2 l ) + J2 U lo g(l + yU). (5) 

i i 

Now, log(l + 2*) < i for i > and is O(l) for i = 0, so by Equation (2), 

J>log(l + 2*)<l + ^^<2 + &. 

Next, log(l + 1/U) < log(l/ti) for < 1/2, so 

^log(l + l/i*)< £ ^log(l/^)+ J] l<ff(t) + l. 

» i\U<l/2 i\U>l/2 

Plugging into Equations (5) and (4), 

I( Vl .,Y)<l + b + H(t). (6) 
To bound H(t), we consider the partition T + = {i \ U > 1/2*} and T- = {i\U< 1/2*}. Then 

ff(t) = ^t i log(l/* i ) 

i 

< J] H Mog(l/*i) 

< 1 + 6+ ^ *ilog(l/ti) 

But xlog(l/x) is increasing on [0, 1/e], so 

*i lo g(V*i) < <o log(l/to) + *i log(lAi) + \i lo g(V2 l ) < 2/e + 3/2 = 0(1) 

iST_ j>2 

and hence i?(t) < 6 + 0(1). Combining with Equation (6) gives that 

I{ Vl *;Y)<b + l 

as desired. □ 

Theorem 3.2. Any scheme using R rounds with number of measurements mi, 777.2, • • • , mR > in 
each round has 

I{P-Y\...,Y R )<C R \{m t 

i 

for some constant C > 1. 



Proof. Let the signal in the absence of noise be Z r = A r ej* £ M. mr , and the signal in the presence 
of noise be Y r = A r ( ei * + N(0, a 2 I n /n)) = Z r + W r where W r = N(0,a 2 I m Jn) independently. 
In round r, after observations y 1 , . . . ,y r ~ 1 of Y 1 , . . . ,Y r ~ 1 , let p r be the distribution on (I* \ 
y 1 , . . . , y r ~ 1 ). That is, p r is Bob's posterior distribution on /* at the beginning of round r. 
We define 

b r = H(I*)-H(I*\y 1 ,...,y r - 1 ) 
= logn — H(p r ) 

Because the rows of A r are deterministic given y 1 ,...,y r_1 , Lemma 3.1 shows that any single 
measurement j € [m r ] satisfies 

I(Zy ) Yj\y\...,f- 1 )<C(b r + l). 

for some constant C. Thus by Lemma B.l 

I(Z r ;Y r | y 1 ,...,^- 1 ) < Cm r (b r + 1). 

There is a Markov chain (I* \ y 1 ,... ,y r_1 ) ->■ (Z r \ y 1 ,... ,y r_1 ) ->■ (l" r | y 1 , ■ ■ ■ , y r ~ 1 ), so 

/(/* | y 1 ,.. . rf' 1 ) < I(Z r ;Y r \y\.. . < Cm r (b r + 1). 

We define B r = 1(1*; Y 1 , . . . , = E y b r . Therefore 

B r+1 = I(I*;Y\...,Y r ) 

= I(I*;Y 1 ,...,Y r - 1 ) + I(I*;Y r \ Y 1 , . . . ,Y r ~ 1 ) 

= B r + E Ur-Y T \y 1 ,...,y r - 1 ) 

J/ >— >y' 

< 5 r + Cm r E (6 r + 1) 

= (5 r + l)(Cra r + l)-l 

< C'm r {B r + 1) 

for some constant C". Then for some constant D > C, 

I(I*;Y\...,Y R ) = B R+1 <D R l[ 



nii 



as desired. □ 

Corollary 3.3. Any scheme using R rounds with m measurements has 

I(I*;Y\...,Y R ) < {Cm/R) R 

for some constant C. Thus for sparse recovery, m = ^(Rlog 1 ^ n). Minimizing over R, we find 
that m = f2(loglogn) independent of R. 

Proof. The equation follows from the AM-GM inequality. Furthermore, our setup is such that Bob 
can recover /* from Y with large probability, so 1(1* ;Y) = O(logn); this was formally shown in 
Lemma 6.3 of [HIKP12] (modifying Lemma 4.3 of [PW11] to adaptive measurements and e = @(1))- 
The result follows. □ 
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4 Lower bound for dependence on k and e for 



In Section 4.1 we establish a new lower bound on the communication complexity of a two-party 
communication problem that we call Multi^. In Section 4.2 we then show how to use an adaptive 
(1 + e)-approximate t\/l\ sparse recovery scheme A to solve the communication problem Multi^x,. 
By the communication lower bound in Section 4.1, we obtain a lower bound on the number of 
measurements required of A. 

4.1 Direct sum for distributional 

We assume basic familiarity with communication complexity; see the textbook of Kushilevitz and 
Nisan [KN97] for further background. Our reason for using communication complexity is to prove 
lower bounds, and we will do so by using information-theoretic arguments. We refer the reader to the 
thesis of Bar-Yossef [Bar02] for a comprehensive introduction to information-theoretic arguments 
used in communication complexity. 

We consider two-party randomized communication complexity. There are two parties, Alice and 
Bob, with input vectors x and y respectively, and their goal is to solve a promise problem f(x,y). 
The parties have private randomness. The communication cost of a protocol is its maximum 
transcript length, over all possible inputs and random coin tosses. The randomized communication 
complexity Rs(f) is the minimum communication cost of a randomized protocol II which for every 
input (x, y) outputs f(x, y) with probability at least 1 — 5 (over the random coin tosses of the 
parties). We also study the distributional complexity of /, in which the parties are deterministic 
and the inputs (x, y) are drawn from distribution fx, and a protocol is correct if it succeeds with 
probability at least 1 — 5 in outputting f(x,y), where the probability is now taken over (x,y) ~ fx. 
We define D^^(f) to be the minimum communication cost of a correct protocol II. 

We consider the following promise problem Gap£^, where B is a parameter, which was studied in 
[SS02, BJKS04]. The inputs are pairs (x, y) of m-dimensional vectors, with X{, yi £ {0, 1,2,..., B} 
for all i £ [m], with the promise that (x, y) is one of the following types of instance: 

• NO instance: for all i, |xj — yi\ £ {0, 1}, or 

• YES instance: there is a unique i for which \xi — yi\ = B, and for all j ^ i, \xj —yj\ £ {0, 1}. 

The goal of a protocol is to decide which of the two cases (NO or YES) the input is in. 

Consider the distribution o: for each j £ [m], choose a random pair (Zj, Pj) £ {0, 1, 2, ... , B} x 
{0,1}\{(0,1),(B,0)}. U(Zj,Pj) = (z,0), thenXj = z and Yj is uniformly distributed in {z, z + 1}; 
if (Zj,Pj) = (z, 1), then Yj = z and Xj is uniformly distributed on {z — 1, z}. Let Z = (Z\, . . . , Z m ) 
and P = (Pi, . . . , P m ). Next choose a random coordinate S £ [m]. For coordinate S, replace 
(X s , Y s ) with a uniform element of {(0, 0), (0, B)}. Let X = (X 1: . . . , X m ) and Y = (Yi, . . . , Y m ). 

Using similar arguments to those in [BJKS04], we can show that there are positive, sufficiently 
small constants So and C so that for any randomized protocol II which succeeds with probability 
at least 1 — <!>o on distribution a, 

I(X,Y;U\Z,P)>^, (7) 

where, with some abuse of notation, II is also used to denote the transcript of the corresponding 
randomized protocol, and here the input (X, Y) is drawn from a conditioned on (X, Y) being a 
NO instance. Here, LT is randomized, and succeeds with probability at least 1 — So, where the 
probability is over the joint space of the random coins of LT and the input distribution. 
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Our starting point for proving (7) is Jayram's lower bound for the conditional mutual infor- 
mation when the inputs are drawn from a related distribution (reference [70] on p. 182 of [Bar02]), 
but we require several non-trivial modifications to his argument in order to apply it to bound the 
conditional mutual information for our input distribution, which is a conditioned on (X, Y) being 
a NO instance. Essentially, we are able to show that the variation distance between our distribu- 
tion and his distribution is small, and use this to bound the difference in the conditional mutual 
information between the two distributions. The proof is rather technical, and we postpone it to 
Appendix C. 

We make a few simple refinements to (7). Define the random variable W which is 1 if (X, Y) is 
a YES instance, and if (X, Y) is a NO instance. Then by definition of the mutual information, if 
(X, Y) is drawn from a without conditioning on (X, Y) being a NO instance, then we have 

I(X,Y-U\W,Z,P) > h(X,Y;U\Z,P,W = 0) 
= n{m/B 2 ). 

Observe that 

I(X,Y;U\S,W,Z,P) > I{X,Y;U\W,Z,P) - H(S) = Q(m/B 2 ), (8) 

where we assume that Q(m/B 2 ) — logm = £l(m/B 2 ). Define the constant <5i = 8q/A. We now 
define a problem which involves solving r copies of Gap£^. 

Definition 4.1 (Multi^ 8 Problem). There are r pairs of inputs (x l ,y l ), (x 2 ,y 2 ), . . . , (x r ,y r ) such 
that each pair {x l ,y l ) is a legal instance of the Gap^ problem. Alice is given x 1 , . . . ,x r . Bob is 
given y 1 , . . . ,y r . The goal is to output a vector v G {NO,YES} r , so that for at least a 1 — 5± 
fraction of the entries i, V{ = Gap£^ (x i , y l ). 

Remark 4.2. Notice that Definition 4-1 is defining a promise problem. We will study the distribu- 
tional complexity of this problem under the distribution a r , which is a product distribution on the 
r instances (x 1 , y 1 ), (x 2 ,y 2 ), . . . , (x r ,y r ). 

Theorem 4.3. D a r M (Multi^ B ) = Vt{rm/B 2 ). 

TT " V B 

Proof. Let IT be any deterministic protocol for Multi^ which succeeds with probability at least 
1 — Si in solving Multi^ when the inputs are drawn from a r , where the probability is taken over 
the input distribution. We show that IT has communication cost Vt{rm/B 2 ). 

Let I 1 ,y 1 ) S 1 ,l^ 1 ,2 1 ,P 1 ...,I r ,F r ,S r ,lf r ,Z r , and P r be the random variables associated 
with a r , i.e., X 3 ,Y J , S J ,W J , P 3 and Z 3 correspond to the random variables X,Y,S,W,Z,P as- 
sociated with the j-ih independent instance drawn according to a, defined above. We let X = 
(X 1 , . . .,X r ), X <3 = (X\ . . .,X 3 - 1 ), and X~ 3 equal X without X 3 . Similarly we define these 
vectors for Y, S, W, Z and P. 

By the chain rule for mutual information, /(X 1 , . . . , X r , Y 1 , . . . , Y r ; H\S, W, Z, P) is equal to 
J2 r j=1 I{X j ,Y 3 ;U\X <3 ,Y <3 ,S,W,Z,P). Let V be the output of n, and Vj be its j'-th coordinate. 
For a value j G [r], we say that j is good if Ptx,y [Vj = Gap^(X J , Y 3 )] > 1 — 2 ^ L . Since IT succeeds 
with probability at least 1 — S\ = 1 — <5o/4 in outputting a vector with at least a 1 — 5q/4 fraction 
of correct entries, the expected probability of success over a random j £ [r] is at least 1 — 5o/2, and 
so by a Markov argument, there are f2(r) good indices j. 

Fix a value of j G [r] that is good, and consider I(X 3 ,Y 3 ; Il\X <3 , Y <3 , S, W, Z, P). By expand- 
ing the conditioning, I(X 3 , Y 3 ; U\X <3 ,Y <3 \ S, W, Z, P) is equal to 

E x , ymp [I(X 3 ,Y 3 ;n | (X< 3 ,Y< 3 ,S- 3 ,W- 3 ,Z- 3 ,P~ 3 ) = (x,y, s,w, z,p), S 3 ,W 3 , Z 3 , P 3 )}. (9) 
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For each x,y, s,w, z,p, define a randomized protocol H x y s w z p f° r Gap-f^ under distribution a. 
Suppose that Alice is given a and Bob is given 6, where (a, b) ~ a. Alice sets X 3 = a, while Bob 
sets Y 3 = b. Alice and Bob use x, y, s, w, z and p to set their remaining inputs as follows. Alice sets 
X <3 = x and Bob sets Y <3 = y. Alice and Bob can randomly set their remaining inputs without 
any communication, since for f > j, conditioned on S 3 , W 3 , Z 3 , and P 3 , Alice and Bob's inputs 
are independent. Alice and Bob run n on inputs X, Y, and define n X) j /jSi?i , j2j p(a, b) = Vj. We say a 
tuple (x,y, s,w, z,p) is good if 

Pr [Vj = Gap£^{X 3 ,Y 3 ) | X <j = x, Y <3 = y, S~ 3 = s, W~ j = w, Z~ 3 = z, P~ 3 =p]>l-5 . 

X,Y 

By a Markov argument, and using that j is good, we have PT X: y jS:WjZjP [(x,y,s,w,z,p) is good ] = 
Plugging into (9), I(X 3 ,Y 3 ; U\X <3 , Y <3 \ S, W, Z, P) is at least a constant times 

U,,,,,r,,,l\X J Y J - n| (X< 3 ,Y< 3 , S' 3 , W~ 3 , Z~ 3 , P~ 3 ) = (x, y, s, w, z,p), 
S 3 , W 3 , Z 3 , P 3 , (x, y , s, w, z, p) is good)] . 

For any (x, y, s, w, z,p) that is good, n XjJ/)SjW)Z) p(a, b) = Vj with probability at least 1 — 5q, over the 
joint distribution of the randomness of R x ,y,s,w,z,p and (a, b) ~ a. By (8), 

E Xiymp [I(X 3 , Y J ; U\(X <3 ,Y <3 \ S~ 3 ,W~ 3 , Z~ 3 , P~ 3 ) = (x, y, s, w, z,p), 

S 3 ,W 3 ,Z 3 ,P 3 ,(x,y,s,w,z,p) is good] = « (^) • 

Since there are Q(r) good indices j, we have I{X 1 , . . . , X r ; H\S, W, Z, P) = Q(mr / B 2 ). Since the 
distributional complexity D cr rg 1 (Multi^ 5 ) is at least the minimum of I(X 1 , . . . , X r ;U\S, W, Z, P) 
over deterministic protocols n which succeed with probability at least 1 — S\ on input distribution 
a r , it follows that D a r A (Mu\t\f^ B ) = n(mr/B 2 ). □ 

4.2 The overall lower bound 

We use the theorem in the previous subsection with an extension of the method of section 6.3 of 
[PW11]. 

Let X C W l be a distribution with x.- L G {— n d , . . . , n d } for all i € [n] and x e X. Here d = 9(1) 
is a parameter. Given an adaptive compressed sensing scheme A, we define a (1 + e)-approximate 
£±/£± sparse recovery multiround bit scheme on X as follows. 

Let A 1 be the i-th (adaptively chosen) measurement matrix of the compressed sensing scheme. 
We may assume that the union of rows in matrices A 1 ,. . . ,A r generated by A is an orthonormal 
system, since the rows can be orthogonalized in a post-processing step. We can assume that r < n. 

Choose a random u 6 W 1 from distribution A/"(0, • I n xn), where c = 9(1) is a parameter. We 
require that the compressed sensing scheme outputs a valid result of (1 + e)-approximate recovery 
on x + u with probability at least 1 — 5, over the choice of u and its random coins. By Yao's 
minimax principle, we can fix the randomness of the compressed sensing scheme and assume that 
the scheme is deterministic. 

Let B 1 be the matrix A 1 with entries rounded to tlogn bits for a parameter t = 0(1). We 
compute B 1 x. Then, we compute B 1 x + A 1 u. From this, we compute A 2 , using the algorithm 
specified by A as if B 1 x + A 1 u were equal to A 1 x' for some x'. For this, we use the following 
lemma, which is Lemma 5.1 of [DIPW10]. 

Lemma 4.4. Consider any mxn matrix A with orthonormal rows. Let B be the result of rounding 
A to b bits per entry. Then for any v £ M. n there exists an s £ W l with Bv = A(v — s) and 
\\s\\\ < n 2 2 _6 ||i;||i. 
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In general for i > 2, given B 1 x + A 1 u, B 2 x + A 2 u, . . . , B l ~ l x + A l ~ l u we compute ^4*, and 
round to ilogn bits per entry to get B l . The output of the multiround bit scheme is the same as 
that of the compressed sensing scheme. If the compressed sensing scheme uses r rounds, then the 
multiround bit scheme uses r rounds. Let b denote the total number of bits in the concatenation 
of discrete vectors B 1 x, B 2 x, . . . , B r x. 

We give a generalization of Lemma 5.2 of [PW11] which relates bit schemes to sparse recovery 
schemes. Here we need to generalize the relation from non-adaptive schemes to adaptive schemes, 
using Gaussian noise instead of uniform noise, and arguing about multiple rounds of the algorithm. 

Lemma 4.5. For t = 0(1 + c + d), a lower bound of Q(b) bits for a multiround bit scheme with 
error probability at most 5 + 1/n implies a lower bound of Q(b/((1 + c + d) logn)) measurements 
for (1 + e)- approximate sparse recovery schemes with failure probability at most 5. 

Proof. Let A be a (1 + e)-approximate adaptive compressed sensing scheme with failure probability 
5. We will show that the associated multiround bit scheme has failure probability S + 1/n. 

By Lemma 4.4, for any vector x £ {— n d , . . . , n d } we have B 1 x = A l (x + s) for a vector s with 
Hi < n 2 2~* lo s n \\x\\^ so ||s|| 2 < n 2 - 5 ~* ||x|| 2 < n 3 - 5+d -*. Notice that u + s ~ J\f(s, ± ■ I nxn ). We 
use the following quick suboptimal upper bound on the statistical distance between two univariate 
normal distributions, which suffices for our purposes. 

Fact 4.6. (see section 3 of [PoW5]) The variation distance between Af(9\, 1) and Af (02,1) is ^== + 
0(r 2 ), where r = |0i - 9 2 \/2. 

It follows by Fact 4.6 and independence across coordinates, that the variation distance between 
A/"(0, ^ ■ Inxn) and M(s, ^ • I nX n) is the same as that between M(0, I n xn) and M(s 
which can be upper-bounded as 



It follows that for t = 0(1 + c + d), the variation distance is at most 1/n 2 . 

Therefore, if T 1 is the algorithm which takes A 1 (x + u) and produces ^4 2 , then T 1 (A 1 (x + u)) = 
T l (B l x + A^-u) with probability at least 1 — 1/n 2 . This follows since B 1 x + A l u = A 1 (x + u + s) 
and u + s and u have variation distance at most 1/n 2 . 

In the second round, B 2 x + A 2 u is obtained, and importantly we have for the algorithm T 2 in 
the second round, T 2 (A 2 (x + u)) = T 2 (B 2 x + A 2 u) with probability at least 1 — 1/n 2 . This follows 
since ^4 2 is a deterministic function of A l u, and A 1 u and A 2 u are independent since A 1 and ^4 2 are 
orthonormal while u is a vector of i.i.d. Gaussians (here we use the rotational invariance / symmetry 
of Gaussian space). It follows by induction that with probability at least 1 — r/n 2 >l — 1/n, the 
output of the multiround bit scheme agrees with that of A on input x + u. 

Hence, if rrii is the number of measurements in round i, and m = ^I=i m «> then we have a 
multiround bit scheme using a total of b = mt log n = 0(m(l + c + d) logn) bits and with failure 
probability 5 + 1/n. □ 

The rest of the proof is similar to the proof of the non-adaptive lower bound for £i/£i sparse 
recovery given in [PW11]. We sketch the proof, referring the reader to [PW11] for some of the 
details. Fix parameters B = 6(l/e 1//2 ), r = k, m = l/e 3//2 , and n = k/e 3 . Given an instance 




0{n c ' 2 ■ y/n\\8\\ 2 + n sU) 

Q^c/2+4+d-t + n c+7+2d-2t^ 
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(x 1 , y l ), . . . , (x r , y r ) of Multi-^P we define the input signal z to a sparse recovery problem. We 
allocate a set S l of m disjoint coordinates in a universe of size n for each pair (x l ,y l ), and on 
these coordinates place the vector y % — x\ The locations turn out to be essential for the proof of 
Lemma 4.8 below, and are placed uniformly at random among the n total coordinates (subject to 
the constraint that the S l are disjoint). Let p be the induced distribution on z. 

Fix a (1 + e)-approximate fe-sparse recovery multiround bit scheme Alg that uses b bits and 
succeeds with probability at least 1 — S±/2 over z ~ p. Let S be the set of top k coordinates in z. 
As shown in equation (14) of [PW11], Alg has the guarantee that if w = Alg(z), then 

- z)s\\i + \\(w- z) [n] \ S \\i < (1 + 2e)||z [n]V? ||i. (10) 

(the 1 + 2e instead of the 1 + e factor is to handle the rounding of entries of the A 1 and the noise 
vector u). Next is our generalization of Lemma 6.8 of [PW11]. 

Lemma 4.7. For B = ©(1/e 1 / 2 ) sufficiently large, suppose that 

Pr[||(«;-z)s||i<10e-|k[„]\s||i]>l-Y- 
Then Alg requires b = ^(/c/e 1 / 2 ). 

T B 

Proof. We show how to use Alg to solve instances of Multi^oo with probability at least 1 — Si, 
where the probability is over input instances to Multi^P distributed according to a r ', inducing the 
distribution p on z. The lower bound will follow by Theorem 4.3. Let w be the output of Alg. 

Given x 1 , . . . , x r , Alice places — x l on the appropriate coordinates in the set S l used in defining 
z, obtaining a vector z Alice- Given y 1 ,. . . ,y r , Bob places the y 1 on the appropriate coordinates in 
S l . He thus creates a vector ZBob for which ZAUce + z Bob = z. In round i, Alice transmits B l ZAUce 
to Bob, who computes B l (zAu C e + ZBob) and transmits it back to Alice. Alice can then compute 
B l (z) + A l (u) for a random u ~ Af(0, -k -Inxn)- We can assume all coordinates of the output vector 
w are in the real interval [0,5], since rounding the coordinates to this interval can only decrease 
the error. 

To continue the analysis, we use a proof technique of [PW11] (see the proof of Lemma 6.8 of 
[PW11] for a comparison). For each i we say that S l is bad if either 

• there is no coordinate j in S l for which \wj\ > y yet (x l ,y l ) is a YES instance of Gap£^,, or 

• there is a coordinate j in S l for which \vjj\ > ^ yet either {x l ,y l ) is a NO instance of Gap^ 
or j is not the unique j* for which y 1 -* — x*« = B. 

The proof of Lemma 6.8 of [PW11] shows that the fraction C > of bad S l can be made an 
arbitrarily small constant by appropriately choosing an appropriate B = 0(l/e 1 / 2 ). Here we 
choose C = Si. We also condition on ||n||2 < n~ c for a sufficiently large constant c > 0, which 
occurs with probability at least 1 — 1/n. Hence, with probability at least 1 — Si/2— 1/n > 1 — Si, we 
have a 1 — Si fraction of indices i for which the following algorithm correctly outputs Gap^oo(x l , y l ): 
if there is a j G S l for which \wj\ > B/2, output YES, otherwise output NO. It follows by Theorem 
4.3 that Alg requires b = ^(/c/e 1 / 2 ), independent of the number of rounds. □ 

The next lemma is the same as Lemma 6.9 of [PW11], replacing S in the lemma statement there 
with the constant Si and observing that the lemma holds for compressed sensing schemes with an 
arbitrary number of rounds. 
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Lemma 4.8. Suppose Pt z ^ p [\\(w — z)[ n ]\s||i < (1 — 8e) • ||-2[„]\s||i] > S±. Then Alg requires b = 
n(klog(l/e)/e 1 / 2 ). 

Proof. As argued in Lemma 6.9 of [PW11], we have I(w;z) = Q(emr\og(n/(mr))), which implies 
that b = Cl(emr\og(n/(rnr))), independent of the number r of rounds used by Alg, since the only 
information about the signal is in the concatenation of B 1 z, . . . , B r z. □ 

Finally, we combine our Lemma 4.7 and Lemma 4.8 to prove the analogue of Theorem 6.10 of 
[PW11], which completes this section. 

Theorem 4.9. Any (1 + e)- approximate t\jl\ recovery scheme with success probability at least 
1 — Si/2 — 1/n must make Q.{k/{e 1 / 2 ■ log(/c/e))) measurements. 

Proof. We will lower bound the number of bits used by any £\/i\ multiround bit scheme Alg. 
If Alg succeeds with probability at least 1 — <5i/2, then in order to satisfy (10), we must either 
have \\(w - z) s \\i < We ■ ||z [n ]\s||i or \\(w - ^) [n] \5||i < (1 - 8e)||z [n] \ 5 ||i. Since Alg succeeds with 
probability at least 1 — S±/2, it must either satisfy the hypothesis of Lemma 4.7 or Lemma 4.8. 
But by these two lemmas, it follows that b = ^(/c/e 1 / 2 ). Therefore by Lemma 4.5, any (1 + e)- 
approximate £\/£i sparse recovery algorithm succeeding with probability at least 1 — 6\/2 — 1/n 
requires fi(fc/(e 1//2 -log(/c/e))) measurements. □ 
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A Relationship between Post-Measurement and Pre-Measurement 
noise 

In the setting of [ACD11], the goal is to recover a fc-sparse x from observations of the form Ax + w, 
where A has unit norm rows and w is i.i.d. Gaussian with variance 1 1 a? 1 1 2 / e 2 . By ignoring the 
(irrelevant) component of w orthogonal to A, this is equivalent to recovering x from observations 
of the form A{x + w). By contrast, our goal is to recover x + w from observations of the form 
A(x + w), and for general w rather than only for Gaussian w. 

By arguments in [PW11, HIKP12], for Gaussian w the difference between recovering x and 
recovering x + w is minor, so any lower bound of m in the [ACD11] setting implies a lower bound 
of min(m, en) in our setting. The converse is only true for proofs that use Gaussian w, but our 
proof fits this category. 

B Information Chain Rule with Linear Observations 

Lemma B.l. Suppose a« = bi + uii for i G [s] and the Wi are independent of each other and the hi. 
Then 

I(a;b) < YsKaiM) 

i 

Proof. Note that h(a \ b) = h(a - b \ b) = h(w \ b) = h(w). Thus 

I(a; b) = h(a) — h(a \ b) = h{a) — h(w) 
< ^2h(ai) - h(wi) 

i 

= ^2 H a i) - K a t 1 bi) = ^2 / ( a *5 hi ) 

i i 
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C Switching Distributions from Jayram's Distributional Bound 

We first sketch a proof of Jayram's lower bound on the distributional complexity of Gap£^, [Jay02], 
then change it to a different distribution that we need for our sparse recovery lower bounds in 
Subsection C.l. Let X, Y 6 {0, 1, . . . ,B} m . Define distribution / u" 1,jB as follows: for each j 6 [m], 
choose a random pair (Zj,Pj) G {0, 1,2,..., B} x {0,1} \ {(0,1), {B,0)}. If (Zj,Pj) = (z,0), then 
Xj = z and Yj is uniformly distributed in {z,z + 1}; if (Zj,Pj) = (z, 1), then ij = z and Xj is 
uniformly distributed on {z — 1, z}. Let X = (Xi, . . . , X m ), Y = (Yi, . . . , Y rn ), Z = (Z\, . . . , Z m ) 
andP=(Pi,...,P m ). 

The other distribution we define is a m,B , which is the same as distribution a in Section 4 
(we include m and -B in the notation here for clarity). This is defined by first drawing X and Y 
according to distribution fi m,B . Then, we pick a random coordinate S G [m] and replace (Xs,Y$) 
with a uniformly random element in the set {(0, 0), (0, B)}. 

Let II be a deterministic protocol that errs with probability at most 5 on input distribution 
a m ' B . 

By the chain rule for mutual information, when X and Y are distributed according to /i m,B , 

m 

I(X,Y;U\Z,P) = Y^I{Xj,Yj;Jl\X^,Y^,Z,P), 

which is equal to 

m 

Kr. ; ,:,:i{X r V,r n \Zj,Pj,X<i = x, Y<i = y, Z~> = z, P~* = p)]. 

j'=i 

Say that an index j G [m] is pood if conditioned on S = j, II succeeds on a m,B with probability at 
least 1 — 25. By a Markov argument, at least m/2 of the indices j are good. Fix a good index j. 

We say that the tuple (x, y, z,p) is good if conditioned on 5 = j, X <J = x, Y <3 = y, Z~° = z, 
and P~i = p, U succeeds on a m ' B with probability at least 1 — AS. By a Markov bound, with 
probability at least 1/2, (x,y,z,p) is good. Fix a good (x,y,z,p). 

We can define a single-coordinate protocol H x ,y,z,p,j as follows. The parties use x and y to fill in 
their input vectors X and Y for coordinates j' < j. They also use Z~i = z, P~ 3 = p, and private 
randomness to fill in their inputs without any communication on the remaining coordinates j' > j. 
They place their single-coordinate input (U, V) on their j-th coordinate. The parties then output 
whatever II outputs. 

It follows that H x ,y,z,p,j is a single-coordinate protocol II' which distinguishes (0, 0) from (0, B) 
under the uniform distribution with probability at least 1—48. For the single-coordinate problem, we 
need to bound I(Xj , Yj ; II' | Zj , Pj ) when (Xj , Yj ) is uniformly random from the set {(Zj, Zj ),(Zj, Zj + 
1)} if Pj = 0, and (Xj,Yj) is uniformly random from the set {(Zj,Zj), (Zj — l,Zj)} if Pj = 1. By 
the same argument as in the proof of Lemma 8.2 of [BJKS04], if 11^ denotes the distribution on 
transcripts induced by inputs u and v and private coins, then we have 

I(Xj,Yj-U f \Zj,Pj) > n(l/B 2 ) ■ (h 2 (U f 0fi ,U f 0tB ) + h 2 (W Bfi ,W BtB )), (11) 

where 
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is the Hellinger distance between distributions a and f3 on support S7. For any two distributions a 
and (3, if we define 

D TV (a,(3) = ^\a(u;)-P(u J )\ 

to be the variation distance between them, then y/2 ■ h(a,(3) > Dj'y{a,f3) (see Proposition 2.38 of 
[Bar02]). 

Finally, since II' succeeds with probability at least 1 — 45 on the uniform distribution on input 
pair in {(0, 0), (0, B)}, we have 

y/2- h(U' 0fi ,U' 0yB ) > D TV (U' 0fi ,U^ B ) = fi(l). 

Hence, 

I(Xj, Yy, U\Zj, P j: X<i = x, Y<J = y, Z~* = z, P~* = p) 

= tt(l/B 2 ) 

for each of the Q(m) good j. Thus I(X,Y;U\Z, P) = Q(m/B 2 ) when inputs X and Y are dis- 
tributed according to /j. m ' B , and II succeeds with probability at least 1 — 5 on X and Y distributed 
according to a m,B . 

C.l Changing the distribution 

Consider the distribution 

( m ' B = (a m ' B \(Xs,Y s ) = (0,0)). 

We show I(X,Y;H\Z) = Q,{m/B 2 ) when X and Y are distributed according to ( m,B rather than 
according to fi m,B . 

For X and Y distributed according to Q m ' B , by the chain rule we again have that I(X, Y; H\Z, P) 
is equal to 

m 

E X)W)Z|P [/(^, Yy, U\Zj, Pj, X<> = x, Y<> = y, Z~* = z, P~* = p)]. 

i=i 

Again, say that an index j G [m] is good if conditioned on S = j, U succeeds on a m ' B with 
probability at least 1 — 25. By a Markov argument, at least m/2 of the indices j are good. Fix a 
good index j. 

Again, we say that the tuple (x,y,z,p) is good if conditioned on S = j, X <: > = x, F <J = y, 
Z~i = z and P~i = p, II succeeds on a m ' B with probability at least 1 — 45. By a Markov bound, 
with probability at least 1/2, (x,y,z,p) is good. Fix a good (x,y,z,p). 

As before, we can define a single-coordinate protocol n x ,y,z,p,j- The parties use x and y to fill 
in their input vectors X and Y for coordinates j' < j. They can also use Z~i = z, P~i = p, and 
private randomness to fill in their inputs without any communication on the remaining coordinates 
j' > j. They place their single-coordinate input (U, V), uniformly drawn from {(0, 0), (0, B)}, 
on their j-th coordinate. The parties output whatever II outputs. Let II' denote H x ,y,z,p,j for 
notational convenience. 

The first issue is that unlike before II' is not guaranteed to have success probability at least 
1 — 45 since II is not being run on input distribution a m,B in this reduction. The second issue is in 
bounding I(Xj,Yj;H'\Zj, Pj) since (Xj,Yj) is now drawn from the marginal distribution of £ m > s 
on coordinate j. 
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Notice that S 7^ j with probability 1 — 1 /to, which we condition on. This immediately resolves 
the second issue since now the marginal distribution on (Xj, Yj) is the same under Q m ^ B as it was 
under a m,B ; namely it is the following distribution: (Xj,Yj) is uniformly random from the set 
{(Zj,Zj),(Zj,Zj + 1)} if Pj = 0, and (Xj,Yj) is uniformly random from the set {(Zj, Zj),(Zj — 

i,z j )}ap j = i. 

We now address the first issue. After conditioning on S 7^ j, we have that (X~ J , Y~ 3 ) is drawn 
from if instead (X~i ,Y~i) were drawn from fi m ~ 1,B , then after placing (U,V) the input 

to II would be drawn from a m ' B conditioned on a good tuple. Hence in that case, II' would succeed 
with probability 1 — 48. Thus for our actual distribution on (X^ 3 ,Y~ 3 ), after conditioning on 
S / j, the success probability of II' is at least 

1 - 45 - Drvin" 1 - 1 ' 3 ,C m ~ 1 ' B ). 

Let C tl,m ~ 1,B be the random variable which counts the number of coordinates % for which 
(Xi,Yi) = (0,0) when X and Y are drawn from /j, m ~ 1,B . Let C^ ,m ~ 1,B be a random variable 
which counts the number of coordinates i for which (Xi,Yi) = (0,0) when X and Y are drawn 
from £ m-1 ) B . Observe that (Xi,Yi) = (0,0) in \i only if Pi = and Zi = 0, which happens with 
probability 1/(25). Hence, C^ m ~ l > B is distributed as Binomial(m - 1, 1/(25)), while C^' 1 ' 3 is 
distributed as Binomial(m — 2, 1/(25)) + 1. We use // to denote the distribution of C (l,m ~ 1,B and £' 
to denote the distribution of cK' m-1 '' B . Also, let 1 denote the Binomial(m — 2, 1/(25)) distribution. 
Conditioned on C^' m ~ 1 ' B = C^' m ~ 1,B , we have that ^ m ~ x ^ B and ^ m - 1 > B are equal as distributions, 
and so 

Drvi^- 1 ' 3 ,^ 1 ' 3 ) < Dtv^'X')- 

We use the following fact: 

Fact C.l. (see, e.g., Fact 2.4 of [GMRZ11]). Any binomial distribution X with variance equal to 
a 2 satisfies D TV (X, X + 1) < 2/a. 

By definition, 

M ' = (l-l/(2B))- t + l/(2B)-C'. 
Since the variance of the Binomial(m — 2, 1/(25)) distribution is 

(m - 2)/(25) • (1 - 1/(25)) = m/(25)(l - o(l)), 
applying Fact C.l we have 

5 TV V,C) = D TV ((1- 1/(25))-,+ (1/(25))- C',C') 
= I- ||(1 -1/(25))-,+ (1/(25))- C'-C'lli 

= (1 - 1/(25))- D tv (l, C) 
2 v / 25 , 




It follows that the success probability of n' is at least 

1 -45- O I a/ — I > 1-5(5. 
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Let E be an indicator random variable for the event that S / j. Then H(E) = O ((log m)/m). 
Hence, 

^Xj^U'lZ^Pj) > I(X j ,Y j ;W\Z j ,P j ,E)-0((\ogm)/m) 

> (1 - 1/m) ■ I{Xj,Yj;T?\Zj,Pj,S + j) - 0((\ogm)/m) 

= m/B 2 ), 

where we assume that Q,(l/B 2 ) — 0((\ogm)/m) = ft(l/B 2 ). 

Hence, I(X,Y;U\Z, P) = VL(m/B 2 ) when inputs X and Y are distributed according to ( m ' B , 
and n succeeds with probability at least 1 — 5 on X and Y distributed according to a m ' B . 
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